Pesquisa | Portal Regional da BVS

Expert-centered Evaluation of Deep Learning Algorithms for Brain Tumor Segmentation.

Hoebel, Katharina V; Bridge, Christopher P; Ahmed, Sara; Akintola, Oluwatosin; Chung, Caroline; Huang, Raymond Y; Johnson, Jason M; Kim, Albert; Ly, K Ina; Chang, Ken; Patel, Jay; Pinho, Marco; Batchelor, Tracy T; Rosen, Bruce R; Gerstner, Elizabeth R; Kalpathy-Cramer, Jayashree.

Radiol Artif Intell ; 6(1): e220231, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38197800

RESUMO

Purpose To present results from a literature survey on practices in deep learning segmentation algorithm evaluation and perform a study on expert quality perception of brain tumor segmentation. Materials and Methods A total of 180 articles reporting on brain tumor segmentation algorithms were surveyed for the reported quality evaluation. Additionally, ratings of segmentation quality on a four-point scale were collected from medical professionals for 60 brain tumor segmentation cases. Results Of the surveyed articles, Dice score, sensitivity, and Hausdorff distance were the most popular metrics to report segmentation performance. Notably, only 2.8% of the articles included clinical experts' evaluation of segmentation quality. The experimental results revealed a low interrater agreement (Krippendorff α, 0.34) in experts' segmentation quality perception. Furthermore, the correlations between the ratings and commonly used quantitative quality metrics were low (Kendall tau between Dice score and mean rating, 0.23; Kendall tau between Hausdorff distance and mean rating, 0.51), with large variability among the experts. Conclusion The results demonstrate that quality ratings are prone to variability due to the ambiguity of tumor boundaries and individual perceptual differences, and existing metrics do not capture the clinical perception of segmentation quality. Keywords: Brain Tumor Segmentation, Deep Learning Algorithms, Glioblastoma, Cancer, Machine Learning Clinical trial registration nos. NCT00756106 and NCT00662506 Supplemental material is available for this article. © RSNA, 2023.

Assuntos

Neoplasias Encefálicas , Aprendizado Profundo , Glioblastoma , Humanos , Algoritmos , Benchmarking , Neoplasias Encefálicas/diagnóstico por imagem , Glioblastoma/diagnóstico por imagem

Not without Context-A Multiple Methods Study on Evaluation and Correction of Automated Brain Tumor Segmentations by Experts.

Hoebel, Katharina V; Bridge, Christopher P; Kim, Albert; Gerstner, Elizabeth R; Ly, Ina K; Deng, Francis; DeSalvo, Matthew N; Dietrich, Jorg; Huang, Raymond; Huang, Susie Y; Pomerantz, Stuart R; Vagvala, Saivenkat; Rosen, Bruce R; Kalpathy-Cramer, Jayashree.

Acad Radiol ; 31(4): 1572-1582, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-37951777

RESUMO

RATIONALE AND OBJECTIVES: Brain tumor segmentations are integral to the clinical management of patients with glioblastoma, the deadliest primary brain tumor in adults. The manual delineation of tumors is time-consuming and highly provider-dependent. These two problems must be addressed by introducing automated, deep-learning-based segmentation tools. This study aimed to identify criteria experts use to evaluate the quality of automatically generated segmentations and their thought processes as they correct them. MATERIALS AND METHODS: Multiple methods were used to develop a detailed understanding of the complex factors that shape experts' perception of segmentation quality and their thought processes in correcting proposed segmentations. Data from a questionnaire and semistructured interview with neuro-oncologists and neuroradiologists were collected between August and December 2021 and analyzed using a combined deductive and inductive approach. RESULTS: Brain tumors are highly complex and ambiguous segmentation targets. Therefore, physicians rely heavily on the given context related to the patient and clinical context in evaluating the quality and need to correct brain tumor segmentation. Most importantly, the intended clinical application determines the segmentation quality criteria and editing decisions. Physicians' personal beliefs and preferences about the capabilities of AI algorithms and whether questionable areas should not be included are additional criteria influencing the perception of segmentation quality and appearance of an edited segmentation. CONCLUSION: Our findings on experts' perceptions of segmentation quality will allow the design of improved frameworks for expert-centered evaluation of brain tumor segmentation models. In particular, the knowledge presented here can inspire the development of brain tumor-specific metrics for segmentation model training and evaluation.

Assuntos

Neoplasias Encefálicas , Glioblastoma , Adulto , Humanos , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/patologia , Algoritmos , Glioblastoma/patologia , Reconhecimento Automatizado de Padrão/métodos , Carga Tumoral , Imageamento por Ressonância Magnética/métodos , Processamento de Imagem Assistida por Computador/métodos

FDU-Net: Deep Learning-Based Three-Dimensional Diffuse Optical Image Reconstruction.

Deng, Bin; Gu, Hanxue; Zhu, Hongmin; Chang, Ken; Hoebel, Katharina V; Patel, Jay B; Kalpathy-Cramer, Jayashree; Carp, Stefan A.

IEEE Trans Med Imaging ; 42(8): 2439-2450, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37028063

RESUMO

Near-infrared diffuse optical tomography (DOT) is a promising functional modality for breast cancer imaging; however, the clinical translation of DOT is hampered by technical limitations. Specifically, conventional finite element method (FEM)-based optical image reconstruction approaches are time-consuming and ineffective in recovering full lesion contrast. To address this, we developed a deep learning-based reconstruction model (FDU-Net) comprised of a Fully connected subnet, followed by a convolutional encoder-Decoder subnet, and a U-Net for fast, end-to-end 3D DOT image reconstruction. The FDU-Net was trained on digital phantoms that include randomly located singular spherical inclusions of various sizes and contrasts. Reconstruction performance was evaluated in 400 simulated cases with realistic noise profiles for the FDU-Net and conventional FEM approaches. Our results show that the overall quality of images reconstructed by FDU-Net is significantly improved compared to FEM-based methods and a previously proposed deep-learning network. Importantly, once trained, FDU-Net demonstrates substantially better capability to recover true inclusion contrast and location without using any inclusion information during reconstruction. The model was also generalizable to multi-focal and irregularly shaped inclusions unseen during training. Finally, FDU-Net, trained on simulated data, could successfully reconstruct a breast tumor from a real patient measurement. Overall, our deep learning-based approach demonstrates marked superiority over the conventional DOT image reconstruction methods while also offering over four orders of magnitude acceleration in computational time. Once adapted to the clinical breast imaging workflow, FDU-Net has the potential to provide real-time accurate lesion characterization by DOT to assist the clinical diagnosis and management of breast cancer.

Assuntos

Neoplasias da Mama , Aprendizado Profundo , Humanos , Feminino , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional , Imagens de Fantasmas , Neoplasias da Mama/diagnóstico por imagem , Algoritmos

Radiomics Repeatability Pitfalls in a Scan-Rescan MRI Study of Glioblastoma.

Hoebel, Katharina V; Patel, Jay B; Beers, Andrew L; Chang, Ken; Singh, Praveer; Brown, James M; Pinho, Marco C; Batchelor, Tracy T; Gerstner, Elizabeth R; Rosen, Bruce R; Kalpathy-Cramer, Jayashree.

Radiol Artif Intell ; 3(1): e190199, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-33842889

RESUMO

PURPOSE: To determine the influence of preprocessing on the repeatability and redundancy of radiomics features extracted using a popular open-source radiomics software package in a scan-rescan glioblastoma MRI study. MATERIALS AND METHODS: In this study, a secondary analysis of T2-weighted fluid-attenuated inversion recovery (FLAIR) and T1-weighted postcontrast images from 48 patients (mean age, 56 years [range, 22-77 years]) diagnosed with glioblastoma were included from two prospective studies (ClinicalTrials.gov NCT00662506 [2009-2011] and NCT00756106 [2008-2011]). All patients underwent two baseline scans 2-6 days apart using identical imaging protocols on 3-T MRI systems. No treatment occurred between scan and rescan, and tumors were essentially unchanged visually. Radiomic features were extracted by using PyRadiomics (https://pyradiomics.readthedocs.io/) under varying conditions, including normalization strategies and intensity quantization. Subsequently, intraclass correlation coefficients were determined between feature values of the scan and rescan. RESULTS: Shape features showed a higher repeatability than intensity (adjusted P < .001) and texture features (adjusted P < .001) for both T2-weighted FLAIR and T1-weighted postcontrast images. Normalization improved the overlap between the region of interest intensity histograms of scan and rescan (adjusted P < .001 for both T2-weighted FLAIR and T1-weighted postcontrast images), except in scans where brain extraction fails. As such, normalization significantly improves the repeatability of intensity features from T2-weighted FLAIR scans (adjusted P = .003 [z score normalization] and adjusted P = .002 [histogram matching]). The use of a relative intensity binning strategy as opposed to default absolute intensity binning reduces correlation between gray-level co-occurrence matrix features after normalization. CONCLUSION: Both normalization and intensity quantization have an effect on the level of repeatability and redundancy of features, emphasizing the importance of both accurate reporting of methodology in radiomics articles and understanding the limitations of choices made in pipeline design. Supplemental material is available for this article. © RSNA, 2020See also the commentary by Tiwari and Verma in this issue.

Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density.

Chang, Ken; Beers, Andrew L; Brink, Laura; Patel, Jay B; Singh, Praveer; Arun, Nishanth T; Hoebel, Katharina V; Gaw, Nathan; Shah, Meesam; Pisano, Etta D; Tilkin, Mike; Coombs, Laura P; Dreyer, Keith J; Allen, Bibb; Agarwal, Sheela; Kalpathy-Cramer, Jayashree.

J Am Coll Radiol ; 17(12): 1653-1662, 2020 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-32592660

RESUMO

OBJECTIVE: We developed deep learning algorithms to automatically assess BI-RADS breast density. METHODS: Using a large multi-institution patient cohort of 108,230 digital screening mammograms from the Digital Mammographic Imaging Screening Trial, we investigated the effect of data, model, and training parameters on overall model performance and provided crowdsourcing evaluation from the attendees of the ACR 2019 Annual Meeting. RESULTS: Our best-performing algorithm achieved good agreement with radiologists who were qualified interpreters of mammograms, with a four-class κ of 0.667. When training was performed with randomly sampled images from the data set versus sampling equal number of images from each density category, the model predictions were biased away from the low-prevalence categories such as extremely dense breasts. The net result was an increase in sensitivity and a decrease in specificity for predicting dense breasts for equal class compared with random sampling. We also found that the performance of the model degrades when we evaluate on digital mammography data formats that differ from the one that we trained on, emphasizing the importance of multi-institutional training sets. Lastly, we showed that crowdsourced annotations, including those from attendees who routinely read mammograms, had higher agreement with our algorithm than with the original interpreting radiologists. CONCLUSION: We demonstrated the possible parameters that can influence the performance of the model and how crowdsourcing can be used for evaluation. This study was performed in tandem with the development of the ACR AI-LAB, a platform for democratizing artificial intelligence.

Assuntos

Neoplasias da Mama , Crowdsourcing , Aprendizado Profundo , Inteligência Artificial , Densidade da Mama , Neoplasias da Mama/diagnóstico por imagem , Feminino , Humanos , Mamografia

Siamese neural networks for continuous disease severity evaluation and change detection in medical imaging.

Li, Matthew D; Chang, Ken; Bearce, Ben; Chang, Connie Y; Huang, Ambrose J; Campbell, J Peter; Brown, James M; Singh, Praveer; Hoebel, Katharina V; Erdogmus, Deniz; Ioannidis, Stratis; Palmer, William E; Chiang, Michael F; Kalpathy-Cramer, Jayashree.

NPJ Digit Med ; 3: 48, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32258430

RESUMO

Using medical images to evaluate disease severity and change over time is a routine and important task in clinical decision making. Grading systems are often used, but are unreliable as domain experts disagree on disease severity category thresholds. These discrete categories also do not reflect the underlying continuous spectrum of disease severity. To address these issues, we developed a convolutional Siamese neural network approach to evaluate disease severity at single time points and change between longitudinal patient visits on a continuous spectrum. We demonstrate this in two medical imaging domains: retinopathy of prematurity (ROP) in retinal photographs and osteoarthritis in knee radiographs. Our patient cohorts consist of 4861 images from 870 patients in the Imaging and Informatics in Retinopathy of Prematurity (i-ROP) cohort study and 10,012 images from 3021 patients in the Multicenter Osteoarthritis Study (MOST), both of which feature longitudinal imaging data. Multiple expert clinician raters ranked 100 retinal images and 100 knee radiographs from excluded test sets for severity of ROP and osteoarthritis, respectively. The Siamese neural network output for each image in comparison to a pool of normal reference images correlates with disease severity rank (ρ = 0.87 for ROP and ρ = 0.89 for osteoarthritis), both within and between the clinical grading categories. Thus, this output can represent the continuous spectrum of disease severity at any single time point. The difference in these outputs can be used to show change over time. Alternatively, paired images from the same patient at two time points can be directly compared using the Siamese neural network, resulting in an additional continuous measure of change between images. Importantly, our approach does not require manual localization of the pathology of interest and requires only a binary label for training (same versus different). The location of disease and site of change detected by the algorithm can be visualized using an occlusion sensitivity map-based approach. For a longitudinal binary change detection task, our Siamese neural networks achieve test set receiving operator characteristic area under the curves (AUCs) of up to 0.90 in evaluating ROP or knee osteoarthritis change, depending on the change detection strategy. The overall performance on this binary task is similar compared to a conventional convolutional deep-neural network trained for multi-class classification. Our results demonstrate that convolutional Siamese neural networks can be a powerful tool for evaluating the continuous spectrum of disease severity and change in medical imaging.

Machine Learning Models can Detect Aneurysm Rupture and Identify Clinical Features Associated with Rupture.

Silva, Michael A; Patel, Jay; Kavouridis, Vasileios; Gallerani, Troy; Beers, Andrew; Chang, Ken; Hoebel, Katharina V; Brown, James; See, Alfred P; Gormley, William B; Aziz-Sultan, Mohammad Ali; Kalpathy-Cramer, Jayashree; Arnaout, Omar; Patel, Nirav J.

World Neurosurg ; 131: e46-e51, 2019 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-31295616

RESUMO

BACKGROUND: Machine learning (ML) has been increasingly used in medicine and neurosurgery. We sought to determine whether ML models can distinguish ruptured from unruptured aneurysms and identify features associated with rupture. METHODS: We performed a retrospective review of patients with intracranial aneurysms detected on vascular imaging at our institution between 2002 and 2018. The dataset was used to train 3 ML models (random forest, linear support vector machine [SVM], and radial basis function kernel SVM). Relative contributions of individual predictors were derived from the linear SVM model. RESULTS: Complete data were available for 845 aneurysms in 615 patients. Ruptured aneurysms (n = 309, 37%) were larger (mean 6.51 mm vs. 5.73 mm; P = 0.02) and more likely to be in the posterior circulation (20% vs. 11%; P < 0.001) than unruptured aneurysms. Area under the receiver operating curve was 0.77 for the linear SVM, 0.78 for the radial basis function kernel SVM models, and 0.81 for the random forest model. Aneurysm location and size were the 2 features that contributed most significantly to the model. Posterior communicating artery, anterior communicating artery, and posterior inferior cerebellar artery locations were most highly associated with rupture, whereas paraclinoid and middle cerebral artery locations had the strongest association with unruptured status. CONCLUSIONS: ML models are capable of accurately distinguishing ruptured from unruptured aneurysms and identifying features associated with rupture. Consistent with prior studies, location and size show the strongest association with aneurysm rupture.

Assuntos

Aneurisma Roto/diagnóstico , Aneurisma Intracraniano/diagnóstico , Aprendizado de Máquina , Adulto , Idoso , Aneurisma Roto/diagnóstico por imagem , Aneurisma Roto/epidemiologia , Estudos de Casos e Controles , Comorbidade , Diabetes Mellitus/epidemiologia , Feminino , Humanos , Hiperlipidemias/epidemiologia , Hipertensão/epidemiologia , Aneurisma Intracraniano/diagnóstico por imagem , Aneurisma Intracraniano/epidemiologia , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Sensibilidade e Especificidade , Fumar/epidemiologia , Máquina de Vetores de Suporte

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA